# Multimodal intelligent agent
UI TARS 72B DPO
Apache-2.0
UI-TARS is the next-generation native GUI intelligent agent model, which has human-like perception, reasoning, and action capabilities, and can seamlessly interact with the graphical user interface (GUI).
Multimodal Fusion
Transformers Supports Multiple Languages

U
parasail-ai
179
0
Videomind 2B FT QVHighlights
Bsd-3-clause
VideoMind is a multimodal intelligent agent framework that enhances video reasoning ability by simulating human-like cognitive processes.
Video-to-Text
Safetensors
V
yeliudev
20
0
Featured Recommended AI Models